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Abstract. The intrinsic stochasticity of gene expression can lead to large variability 
in protein levels for genetically identical cells. Such variability in protein levels 
can arise from infrequent synthesis of mRNAs which in turn give rise to bursts of 
protein expression. Protein expression occuring in bursts has indeed been observed 
experimentally and recent studies have also found evidence for transcriptional bursting, 
i.e. production of mRNAs in bursts. Given that there are distinct experimental 
techniques for quantifying the noise at different stages of gene expression, it is of interest 
to derive analytical results connecting experimental observations at different levels. 
In this work, we consider stochastic models of gene expression for which mRNA and 
protein production occurs in independent bursts. For such models, we derive analytical 
expressions connecting protein and mRNA burst distributions which show how the 
functional form of the mRNA burst distribution can be inferred from the protein 
burst distribution. Additionally, if gene expression is repressed such that observed 
protein bursts arise only from single mRNAs, we show how observations of protein 
burst distributions (repressed and unrepressed) can be used to completely determine 
the mRNA burst distribution. Assuming independent contributions from individual 
bursts, we derive analytical expressions connecting means and variances for burst and 
steady-state protein distributions. Finally, we validate our general analytical results 
by considering a specific reaction scheme involving regulation of protein bursts by 
small RNAs. For a range of parameters, we derive analytical expressions for regulated 
protein distributions that are validated using stochastic simulations. The analytical 
results obtained in this work can thus serve as useful inputs for a broad range of studies 
focusing on stochasticity in gene expression. 
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1. Introduction 

The intrinsic stochasticity of biochemical reactions has important consequences for the 
functioning of cellular processes [TJ [2]. In particular, reactions corresponding to the 
process of gene expression often involve small numbers of molecules, and can be subject 
to large fluctuations. The corresponding stochasticity in gene expression has been 
identified as a key factor underlying the observed phenotypic variability of genetically 
identical cells in homogeneous environments [3j. Quantifying the effects of intrinsic 
noise using stochastic models of gene expression is thus an important step towards 
understanding cellular function and variability. 

Several recent studies have focused on quantifying noise in gene expression using 
both single-cell assays and single-molecule techniques. Experimental observations 
of noise in steady-state protein distributions across a population of cells [3] were 
shown to be consistent with predictions from simple models based on translation from 
individual mRNAs (31 E] . These models predict that each mRNA produces a burst of 
protein that is geometrically distributed |6j. Single- molecule studies have indeed seen 
protein production occurring in bursts and determined that the corresponding protein 
burst distribution is geometric (TJ EJ [9]. At the mRNA level, single-molecule studies 
have demonstrated that mRNA production can also occur in transcriptional bursts 
[31 [7J QUI El EEl [13]. The presence or absence of transcriptional bursting indicates 
different sources of noise in gene expression and several studies are currently engaged 
in probing gene expression at multiple stages to elucidate the underlying sources of 
variability p31 QU [15]. 

Given different experimental techniques for probing stochasticity in gene expression 
using measurements at different stages (specifically steady-state and burst distributions 
for proteins and mRNAs) [HUE], it is of interest to derive analytical results connecting 
observables at different levels. These results can be used to infer information at one level 
using experiments at a different level. For example, in previous work [18] it was shown 
that experimental determination of the protein burst distribution and frequency can be 
used to determine the steady-state protein distribution. In this context, we note that 
most previous models have focused on reaction schemes which correspond to a geometric 
burst distribution for proteins produced from a single mRNA [181 HH]- However, more 
general reaction schemes for protein production from mRNAs can lead to deviations from 
geometric burst distributions for single mRNA bursts [2D] . It would thus be desirable 
to derive analytical formulae connecting burst and steady-state protein distributions 
for arbitrary protein burst distributions. Finally, we note that such analytical results 
can be used to check for consistency between the experimental results from probing 
different levels of gene expression. In particular, any observed inconsistencies could 
signal that some model assumptions are invalid, potentially leading to new insights 
about the mechanisms of gene expression. 

In this work, we analyze a class of burst models for protein production from mRNAs 
and derive analytical results connecting observable distributions at different stages of 
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gene expression. In particular, we show how the functional form of the mRNA burst 
distribution can be determined using the observed protein burst distribution. If mRNA 
transcription can be repressed such that observed protein bursts arise only from single 
mRNAs, then the derived results show how observations of protein burst distributions 
(repressed and unrepressed) can be used to completely determine the mRNA burst 
distribution. Assuming independent bursts whose arrival can be modeled as a Poisson 
process, we derive expressions connecting the mean and variance of protein burst 
distributions to the corresponding quantities for the steady-state distribution of protein 
levels across a population of cells. Finally, we consider a specific example for which 
burst distributions can deviate from the geometric distribution: post-transcriptional 
regulation of bursts by small RNAs. In the limit of low burst frequency, we derive 
analytical expressions for the protein burst distribution which are in excellent agreement 
with results from stochastic simulations. The results derived in this work can thus serve 
as useful building blocks for future studies focusing on stochasticity in gene expression. 



2. Tools, Notations, and Definitions 

A starting point of our analysis is the Master equation [2TJ 

d t P(n; t) = H(n)P(n; t), (1) 

where n = {nx} is a state vector describing the abundance of each species X in the 
system. Here P(n; t) is the probability to find the system with the state vector n after 
time t has elapsed. Equation §T§ is supplemented by the initial conditions, namely the 
initial distribution Po(n) at time t — 0. 

The generating function of the probability distribution Eq. ([T|) is defined by 

oo 

G{x- t) = J2 P ^ *) n ( 2 ) 

n;=0 i={X} 

where x = {xi} is a real vector dual to the state vector n. The generating function, in 
turn, satisfies the corresponding evolution equation and initial condition 

d t G(x; t) = U(x) G(x; t), (3) 

oo 

G(x; 0) = G (x) = Po<n)]l x ?- ^ 

n,i=0 i 

Let us first consider the simplest gene expression reaction scheme. The minimal 
model [6] of gene expression is given by the diagram on FigJTJ The corresponding 
reaction scheme is 

D^M^M + P; M^>0; P^>0; (5) 

where D is DNA, M is mRNA, and P is protein. Both mRNAs and proteins are 
synthesized at the constant rates k m and k p respectively, and their degradation (decay) 
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Figure 1. Reaction scheme for the minimal model of gene expression. A. mRNA 
transcripts are created with reaction rate k m and degraded with reaction rate /x m . 
Protein is translated from mRNA with a reaction rate k p and degraded with a reaction 
rate fi p . B. Evolution of protein distributions for the same scheme can be analyzed in 
terms of arrivals of bursts of proteins (valid when \x p << /x m ) |18) . 



rates are p m and fi p . Correspondingly, the evolution operator in the equation for the 
generating function is given by 

d 

OX m 

d d 
k p (x p -l)x m ^— + p p {l- x p ) — . (6) 

Note that generating function representation is particularly useful since it converts the 
infinite set of equations for various integer values of n in Eq. (pQ) into a single partial 
differential equation. Moreover, a calculation of any observable quantities, such as 
moments of distribution, is equivalent to evaluation of derivatives of the generating 
function at point x — {1, 1, • ■ • , 1}. 

There are several parameters that describe the dynamics of the gene expression 
models analyzed in this work. The following rules serve as general guides for the notation 
used: 

• Indices X = m,p, s stand for mRNA, protein, and sRNA species correspondingly. 

• Lower case letters are used to describe burst variable, e.g., p m denotes the 
probability distribution of mRNA burst size and g m is its generating function. 

• Capital letters are used to describe steady-state variable, e.g., P p denotes the protein 
steady-state distribution and G p is its generating function. 

Finally, we define some distributions that arise when considering bursts of gene 
expression. The geometric distribution is given by 

p{n) = (1 - u) n u, n>0 (7) 

with the corresponding generating function 

and mean given by (1 — u)/u. It is also convenient to define the conditional geometric 
distribution 

p(0) = 0, 

p{n) = (1 - w) n -y n > 1 (9) 
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with the corresponding generating function 



The conditional geometric distribution is encountered [19] when considering the 
distribution of mRNAs that give rise to a protein burst. Since the observation of a 
burst of proteins necessarily implies the presence of at least 1 mRNA, the distribution 
is conditioned accordingly. We note that the mean of the conditional distribution is 
given by 1/u. In the limit u — > 1, the distribution Eq. ([9]) describes a mRNA burst with 
exactly 1 mRNA produced per burst, which is the case when mRNA arrival corresponds 
to a Poisson process. 



3. Bursts and modeling framework 

Recent experiments have determined the variation of noise in protein expression as a 
function of mean protein abundance for several genes [22l 123] . The observed scaling 
relationship is consistent with different underlying models (see Figured]). In one case, 
the transcription rate k m is constant corresponding to a Poisson process driving mRNA 
synthesis. Another possible scenario corresponds to a Telegraph process [2lfTU] , l2^1 ,l25ll26] 
driving the creation of mRNAs. In this case, the promoter driving gene expression 
switches between active and inactive states. When the promoter is in the active state, 
multiple number of mRNAs can be transcribed. While both models are consistent with 
the experimental data, the observed scaling indicates that protein production occurs in 
infrequent bursts for many genes. 

Based on observations relating to bursts, an analytical approach [18J was 
introduced to derive expressions for steady-state protein distributions from protein burst 
distributions. Specifically, it is assumed that (i) protein degradation rate is much smaller 
than mRNA degradation rate (/x p <C // m ), (ii) protein levels vary due to independent 
bursts of protein expression in combination with changes due to protein degradation 
and (iii) the arrival of bursts can be modeled as a Poisson process. The above approach 
then reduces the problem of characterizing protein steady-state distributions into two 
parts: (i) first obtain the protein burst distribution for a single burst and (ii) using this 
burst distribution as input, derive and analyze the corresponding Master equation (see 
Fig. IB) for proteins alone [181 A mathematical justification of this procedure of 
deriving a Master equation for proteins alone, given the assumptions stated above, has 
been provided recently [27] . 

In the following sections, we will consider stochastic models of gene expression 
consistent with the assumptions stated above. Specifically, we consider models for 
which mRNA and protein production occurs in independent bursts such that the 
arrival of bursts corresponds to a Poisson process. As noted above, even for a Poisson 
process, there are parameter constraints that must be satisfied (fi p <C /i m ) for the burst 
approximation to be valid. While previous work has largely focused on reaction schemes 
that give rise to a geometric burst of proteins from a single mRNA, we will consider 
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Figure 2. (color online) Schematic representation of the Poisson and Telegraph mRNA 
transcription processes. A. The Poisson process of transcription. The DNA is always in 
the "on" state resulting in a constant production rate of mRNA transcript (green lines) . 
B. The Telegraph process of transcription. The DNA exchanges between two states, 
"off' and "on" . The "off' state corresponds to inactive DNA in which no transcripts 
are produced, while the "on" state corresponds to DNA capable of producing a burst 
of mRNA transcripts before it reverts back to the "off' state. 



the case for which the protein burst distribution can be an arbitrary function. For 
such models, we wish to derive analytical expressions which connect observations at 
different stages of gene expression (see Figure |3]). The following section first considers 
how observations of protein burst distributions can inform us about the underlying 
mRNA burst distributions. 



4. From protein to mRNA burst distribution 

We first consider the minimal scheme (Eq. (JS}) of protein production from mRNAs. For 
this scheme, the following equation relating the mRNA and protein burst distributions 
can be derived (see Appendix): 

fl 



9m(x) = g P 



X) 



[kp/ ji r , 



11 



The functions g m and g p in the equation Eq. (iTTj) are correspondingly the mRNA 
and protein burst generating functions. The dynamical version (for time dependent 
distributions) of Eq. ffTTT) can also be found in the Appendix. Note that, consistent with 
the assumption \i m ^> /x p , we ignore protein degradation during a single burst, i.e. the 
above equation considers only the proteins synthesized during the burst. 

The result Eq. (fTTT) is useful because it allows us to infer the functional form of the 
mRNA burst distribution from observations of protein burst distributions. Consider the 
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Figure 3. Different approaches for probing bursts of gene expression. A. Measuring 
the mRNA burst distributions directly. B. Measuring the protein distributions 
resulting from the mRNA burst distributions. The derived results provide means of 
connecting these two measurements at different stages of gene expression for the case 
of arbitrary protein burst distributions arising from a single sRNA. 



case that the observed protein burst distribution (e.g. as reported in [8]) is a geometric 
distribution (Eq. flTJ)) with parameter u = u p . Then, using the expression Eq. ( ITT]) , we 
obtain that the mRNA burst distribution has to be a conditional geometric distribution, 
Eq. (jl2]), with parameter 

u m = J^-^. (12) 

f^m J- ^p 

In other words, the mRNA burst distribution is given by 

P m {n) = p{u m ), n>l (13) 

While the functional form of the mRNA burst distribution is thus determined, we 
note that the precise distribution is not known since the parameter (— ) is not known. 
The upper bound for — can be derived from the condition u m = 1 

kA = l^p ; (M) 

Mm/ max u p 

which corresponds to the Poisson scenario, i.e. the observed burst distribution is 
produced from a single mRNA. On the other hand, we can have (■&■) < m max , 
which implies u m < 1 and thereby that the mean number of mRNAs in the burst (— ) 
is greater than 1. This set of parameters would be consistent with a Telegraph process 
driving mRNA creation since it produces a geometric mRNA burst distribution (with 
u m < 1) and also gives rise to a geometric protein burst distribution [T9] . 

It has been noted in previous work [JTJJ, [221 HH] that the Poisson and Telegraph 
processes cannot be distinguished by experimental observations on proteins alone, since 
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both both Poisson and Telegraph processes give rise to a geometric burst distribution 
for proteins. However, previous work did not preclude other possible mRNA burst 
distributions that can result in a geometric protein burst distribution. The preceding 
analysis demonstrates that, if the observed protein burst distribution is geometric, 
then the mRNA burst distribution has to be a conditional geometric distribution. 
Thus Eq. f lT3|) is a mathematically necessary and sufficient condition on the mRNA 
burst distribution to obtain a geometric burst distribution for proteins. An important 
corollary is that kinetic schemes which lead to non-geometric mRNA burst distributions 
can be ruled out if the observed protein burst distribution is a geometric distribution. 

Let us now consider general reaction schemes which can give rise to non-geometric 
protein burst distributions. This can occur due to interaction with a post-transcriptional 
regulator [20] or even otherwise, e.g. if we have switching between competing 
mRNA secondary structure conformations which correspondingly have different protein 
production and/or mRNA degradation rates. Another example is the case for which 
mRNA degradation is not a Poisson process but occurs in stages (termed mRNA 
senescence [28J ) ; in general the corresponding protein burst distribution will not be 
a geometric distribution. In such cases, the preceding analysis can be generalized as 
follows. Let us denote by <f>(x) the generating function of protein bursts obtained from 
a single mRNA. The number of proteins produced in a single burst can be expressed as 
the sum of a random number (N) of random variables, each of which is drawn from the 
probability distribution corresponding to <f)(x). The random variable iV corresponds to 
the number of mRNAs in the burst with generating function g m (x). Correspondingly, 
the generating function of the protein burst distribution is given by 

9 P (x) = g m [4>(x)\ , (15) 

Inversion of Eq. (1151) yields the mRNA burst distribution 

9m(z) = g P [<P~ l (z)] , (16) 

Note that Eq. ffTTT) is a special case of Eq. f|T6|) . For the minimal scheme of gene expression 
(Fig. 1), the burst distribution from a single mRNA is a geometric distribution with 
mean [6]. Correspondingly, the generating function is given by <j>{x) = xri_ u \ x , with 
u = k ^ m . Inversion of <p(x) in combination with Eq. (I16p gives Eq. (11 ip . 

The significance of the above equations is that once 4>(x) is determined, the mRNA 
burst distribution can be inferred from the observed protein burst distribution (and 
vice- versa). Recent experiments [8] have shown that repressors can be used to regulate 
gene expression such that each observed burst corresponds to proteins produced from 
a single mRNA. Such experiments can be used to determine the single mRNA burst 
distribution and hence <p(x). Thus, if the protein burst distributions can be observed for 
both scenarios, with and without the repressor, then Eq. (TLB"]) can be used to completely 
determine the mRNA burst distribution. 
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5. Connecting Burst and Steady-State Distributions 

While the direct observation of protein expression bursts has been demonstrated 
experimentally [3 EJ |9] ; in general, carrying out such experiments is challenging. Since 
steady-state protein distributions are less challenging to determine experimentally, it is 
of interest to derive results connecting burst and steady-state distributions, in particular 
connecting the means and variances. We note that recent work [28J has derived results 
connecting burst and steady-state variances for general models of gene expression, 
in particular for models such that the waiting-time distribution between bursts can 
arbitrary, as opposed to the simple exponential distribution which corresponds to a 
Poisson process for burst arrival. In the following, we first focus on the case of Poisson 
arrivals for bursts. 

As discussed in Section [3J we assume that each burst can be considered as an 
independent realization of the same stochastic process and that burst arrival can be 
modeled as a Poisson process. Let us denote by P b {n) the probability that n proteins 
are produced during a single burst. Correspondingly, the Master equation for the protein 
distribution at time t (P(n,t)) is [29J: 

d t P(n, t) = n p [P{n + l,t)- P(n, t)} 

oo 

+ k b J2 [Pb{n')P{n - ri, t) - P b (n')P(n, t)} (17) 

n'=0 

The parameter kb is the constant rate of burst arrival, i.e. it is the inverse of the mean 
time between two sequential bursts. If each burst corresponds to proteins produced 
from a single mRNA, then kb is identical to the mRNA creation rate k m . 
Let us define the generating functions: 

oo 

G b (x) = J2^Pb(n), (18) 

n=0 

oo 

G(x,t) = ^x n P(n,t) (19) 

n=0 

Correspondingly, the evolution equation for the generating function is [29] 

d t G{x, t) = /i p (l - x)d x G(x, t) + k b [G b {x) - 1] G{x, t), (20) 

The time dependent solution of Eq. ( 1201) can be obtained by the method of 
characteristics. The steady-state limit (G s (x)) is given by [29] . 

GAx)=ew itir(m^i) dy ). (21) 



.fhJi \ y~ l 

From Eq. (|2ip we can also derive useful expressions for the mean and the Fano 
factor (or noise strength) of the steady-state distribution in terms of the corresponding 
quantities for burst distribution. We obtain: 

n s = (—) n b , (22) 
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=l + n 6 + ~M 
n s 2 V n 6 



[1 + ^ 



(23) 



If the burst distribution is geometric, we have ^- = 1 + n b and the above result reduces 
to previously obtained results [21 [27] in the limit \i m ^> ji p . For general reaction 
schemes, the burst distribution differs from the geometric distribution and Eq. ( 123]) is 
the generalization that connects burst and steady-state distributions. It is interesting 
to note that a similar result was obtained in previous work [28] with different model 
assumptions: specifically, each mRNA was assumed to produce a geometric burst of 
proteins, however the number of mRNAs in the burst was assumed to be drawn from 
an arbitrary burst distribution. 

The preceding discussion focused on the case that the burst arrival is a Poisson 
process, thus the waiting-time distribution between bursts is given by an exponential 
distribution. For a Poisson process driving mRNA production this is certainly the case. 
However, mRNA production has also been proposed to arise from a Telegraph process 
[TOl [T3~] which, in general, does not have the feature that the waiting-time between 
bursts is an exponential distribution [26] . We consider the case that the DNA fluctuates 
between two different conformations which correspond to different production rates for 
the mRNAs. In particular, we consider a two-stage model [10] corresponding to two 
different active confirmations of DNA ("on" and "off" or 1 and 2 say) , with mRNA 
transcription rates k\ and h%{< k\) respectively, Note that previous work has focused 
on the case k 2 = 0, i.e. no transcription in the "off" state. The present results generalize 
this model to allow for a basal level of transcription in the "off" state as well. Let us 
define a parameter / = |^ which is the ratio of the two rates and takes values between 
and 1. We denote by A12 the rate of switching from conformation 1 to conformation 
2, and A 2 i is the rate for the reversed process. For this two-stage model, an analytical 
expression linking the burst distribution to the steady-state distribution (analogous to 
Eq. ( l2TT) ) seems intractable. However, we can derive expressions for steady-state mean 
and variance (see Appendix). We obtain: 



k _ 
— n b , 
Hp 



(24) 



n s 2 \ n b 



+ 1 + 



A12 + A 



21 



ftp 



where we defined 
A 2 ifci 



k 



Al2^2 



(A12 + A21) 



-1 



Al2 
A2I 



/ 



1 + 



(25) 



(26) 



Note that for the case / = 0, the above formula reduces to previously obtained 
results [21 [21] , whereas for / = 1 we recover Eq. f[23|) . The above result thus generalizes 
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previously obtained results for the case of nonzero / and for arbitrary protein burst 
distributions. 

6. Burst Distribution for Regulation by Small RNAs 



The preceding sections derived general results connecting burst and steady-state 
distributions for burst distributions which can deviate from a geometric distribution. 
We now consider a specific regulation scheme that can give rise to non-geometric 
protein burst distributions: regulation by small RNAs. Small RNAs are genes that 
are transcribed but not translated, i.e. they are non-coding RNAs. In bacteria, small 
RNAs have been studied extensively in recent years [30] in part due to the critical 
roles they play in cellular post-transcriptional regulation in response to environmental 
changes. 
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Figure 4. The sRNA-mRNA regulation scheme. The sRNA production rate is k s and 
the degradation is (i s . The interaction rate between the sRNA and mRNA that results 
in mutual degradation is 7. The mRNA and Protein reaction rates are the same as 
shown in figure [I] 



The reaction scheme for small RNA based regulation has been studied by several 
groups [3TJ [32J E3J EH] and is schematically represented in Figure HJ In the limit of 
large concentrations of the small RNA regulator, the fluctuations of the small RNAs 
can be neglected and a more general model can be analyzed [20J. However, when 
the fluctuations of the regulator cannot be neglected, the exact solution of the model 
represented in Figure H] is analytically intractable and approximations schemes need to 
be employed. In the following, we show how, in the limit of infrequent protein bursts, 
an analytical expression for the generating function of the protein burst distribution can 
be derived which agrees well with simulations. 
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We consider the case that mRNA production is governed by a Poisson process 
with constant rate k m . In the limit of low k m , the small RNA distribution prior to 
each burst can be well approximated by the unregulated small RNA distribution, which 
corresponds to a Poisson distribution with mean n s = — . With these approximations, 
it is possible to derive an expression for the regulated protein burst distribution due to 
interaction with small RNAs as shown below. 

Let us begin with the initial condition (t = 0) corresponding to the arrival of a 
mRNA. The protein burst distribution corresponds to the number of proteins produced 
from this single mRNA until the time it is degraded, either naturally or due to interaction 
with small RNAs. Our approach will focus on first deriving an expression for the survival 
probability of the mRNA at time t (S(t)). Let us define Pi(n,t) as the probability that 
the mRNA exists at time t (i.e. it has not been degraded) and the number of sRNAs is 
n. Then, the mRNA survival probability is given by S(t) = Yl^Lo^i 71 ^)- Let us now 
define the operator H s which acts as follows 

H s P(n) = k s [P(n - 1) - P(n)} + /2 S [{n + l)P(n + 1) - nP(n)\ . (27) 
In terms of this operator, we can write down the Master equation for P\(n, t) as follows 

dtP x {n) = H s Px(n) - fimPxin) - ynP x {n), (28) 
The corresponding initial condition is taken as 

Pi(n,t = 0) = e- n °^r, (29) 
nl 

(30) 

where n s = (k s / /x s ) (i.e., Poisson distribution of sRNAs at time t = 0) as discussed 
above. 

In order to solve the Eq. (12 8 j) let us once again define a generating function 

oo 

G 1 (x,t) = J2^ n Pi(n,t), (31) 

71=0 

which satisfies the partial differential equation 

d t Gi(x, t) = (H s -Hm- jxd x )G t (x, t), (32) 
G 1 (x,0) =exp(n s (x- 1)). (33) 

Here the differential operator H s can be easily derived from the equation Eq. (12 7p . 
namely H s = (x — l)(k s — fJ, s d x ). The value of the generating function Gi(x,t) at point 
x = 1 corresponds to Y^=o Pi( n i 0j i- e -' ^ ne survival probability S(t) of the mRNA 
molecule at time t. This survival probability can be obtained by solving Eq. (|32l) using 
the method of characteristics (Appendix). We obtain 

S(r) = exp [-a (1 - e~ T ) - 0r] , (34) 

where we have defined the following dimensionless parameters: 

r={fi s + j)t, (35) 
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We can now proceed and calculate the generating function Gb(x) of the protein burst 
distribution. Since protein production occurs at a constant rate k p during the mRNA 
lifetime, the number of proteins produced by a surviving mRNA in time t is given by 
the Poisson distribution, with the corresponding generating function given by e kp ^ x ~ 1 ^ t . 
Since the difference S(t) — S(t + St) of survival probabilities is the probability that the 
mRNA degrades within the time interval {t,t + St}, we obtain the burst generating 
function as 



Jo 

Rewriting the burst size distribution in terms of dimensionless parameters results in the 
following integral form 



The burst distribution with sRNA regulation, Eq. (139]) . has some interesting 
features. We note that Eq. ( 159]) predicts that the burst distribution depends on three 
dimensionless parameters, a, (3, and k (equations f ]3"6"]) . ( 157j) . and fl4"0"]) ) and the steady- 
state distribution (see Eq. ( 121]) ) only adds a dependence on k m j \i v . Thus the modulation 
of any of the kinetic parameters shown in Figure H] (for fixed k m /fi p ) should result in 
the same steady-state distribution so long as the modifications occur in such a way 
that a, (3, and k remain constant (and model assumptions/ approximations are valid). 
As shown in Table 1, we can choose very different kinetic parameters that give rise to 
the same values for a, (3 and k and the prediction is that the burst and steady-state 
distributions for these different parameter choices should collapse onto a single curve. To 
test this scaling prediction, we carried out stochastic simulations based on the Gillespie 
algorithm [22] for a range of parameters such that a, (3, and k remain constant. From 
the simulations, we recorded the resulting steady-state distributions and compared it to 
the analytic result (see figure [5]). For the choice of parameters noted, we observed that 
the burst distribution is close to and can be well fitted by a geometric distribution. For 
a geometric distribution, the steady-state and burst generating functions are related by 
G s (x) = (Gb(x)) m . We used this approximation to obtain the analytical form of the 
steady-state generating function and derived the steady-state protein distribution P s (n) 
using this. As can be seen in Fig. H] the corresponding analytical results are in good 
agreement with results from simulations. The simulation results are also consistent with 
the scaling prediction since the curves with different parameter choices all collapse onto 






(39) 



where k is yet another dimensionless parameter 
k _ h p 



(40) 
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Figure 5. (color online) steady-state distributions with sRNA regulation. The 
dashed curve corresponds to Eq. (|2"Tj) using Eq. (|3"9")l with approximations (see text) 
and a ~ 4.76, (3 ~ 1.34, and k ~ 243.9. The other curves are the results from four 
sets of numeric simulations. See Table 1 for the values of the parameters used in the 
simulations. 



Simulation # 


k p 


k s 


/i S 


7 


1 


250 


0.400313 


0.072619 


0.952381 


2 


300 


0.717708 


0.122308 


1.107690 


3 


400 


1.378120 


0.217778 


1.422222 


4 


500 


2.055630 


0.310870 


1.739130 



Table 1. The values of the parameters used in the numeric simulations shown in figure 
El For all simulations, a ~ 4.76, f3 ~ 1.34, and k ~ 243.9. Also, fi m = 1, k m = 0.01, 
fi p = 0.005. 

a single curve. The small discrepancy between the theoretical results and simulations is 
attributed to the approximations made, specifically the approximation for G s (x) noted 
above which is strictly valid only if the burst distribution is geometric. The results 
obtained from simulations of individual bursts are in very good agreement with the 
corresponding theoretical predictions. 
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7. Summary 

Recent experiments underscore the need for connecting observed protein distributions 
from single-cell and single molecule studies using coarse-grained models of stochastic 
gene expression. In this context, several results have been derived in the present study 
which will help in the analysis of experimental results. We have shown how the functional 
form of the underlying mRNA burst distributions can be determined from observed pro- 
tein distributions. If the protein burst distribution is geometric then the corresponding 
mRNA burst distribution has to be a conditional geometric distribution. The derived 
results further show that if the promoter can be repressed such that observed pro- 
tein bursts arise from single mRNAs, then the underlying mRNA burst distribution in 
the unrepressed state can be completely determined. Furthermore, we derive relations 
connecting means and variances for burst and steady-state distributions for burst dis- 
tributions which can deviate from a geometric distribution. The results derived also 
provide insight into regulation of protein expression bursts by small RNAs. The general 
results derived in this work can thus be used for analysis of a wide range of models of 
gene expression. 

The authors acknowledge funding support from NSF (PHY-0957430) and from 
ICTAS, Virginia Tech. 

8. Appendix 

8.1. Relationship between mRNA and protein burst generating functions 

Let us define P(m, n; t) as the probability to find m mRNAs and n proteins after 
time t elapses since burst arrival. The corresponding generating function G p (x,y;t) = 
^ mn x m y n P(m,n\t) satisfies the following partial differential equation: 

d t G = fi m (l-x)d x G + k p (y-l)xd x G. (41) 

The equation above can be easily solved by the method of characteristics 

G(x, y; t) = G m [ - ~ (1 + p)gg^ l , (42) 

where G m [.] is generating function of mRNAs at t — 0, and we defined 

Y = l-^(y-l). (43) 

Therefore, the time dependent distribution of proteins in the burst is given by 
generating function 

"1 - (1 + Y)e-^ Yt ~ 



G b (y;t) = G(l,y;t) = G n . 

and the corresponding steady state is simply 

1 



G b (y) = G r 



Y 



(44) 



(45) 
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which is identical to the equation in the main text (Eq. (jlip ). 
8.2. Two stage model 

Assume that the mRNA production rate k m has a value k\ in the state 1 and a value k 2 
in the state 2. The state 1 switches with probability A12 into the state 2 and back with 
probability A21. One gets the following set of the equations for the generating functions 
Gi (2) (x,t): 

d t G x = h [G b {x) - 1] Gi + // p (l - x)d x G 1 - A 12 Gx + A 2 iG 2 , (46) 
d t G 2 = k 2 [G b (x) - 1] G 2 + /i p (l - x)d x G 2 + A 12 Gx - A 2 iG 2 . (47) 

Let us explicitly calculate two moments of the protein's steady-state distribution. By 
setting x — 1, t — > 00 we get 

= A 2 iP 2 s , (48) 
PI + P$ = 1, (49) 

where P" = Gi(l,oo) and P 2 S = G 2 (l,oo) are steady-state probabilities to be in the 
states 1 and 2 accordingly. Therefore, one derives 

pi = ( s °) 

Pi = y^-- (5i) 

Al 2 + A 2 1 

By evaluating the first derivative with respect to x at point x — 1 one can calculate 
( n i) = Y,n=o nP i( n )' « = 1,2: 

= k x {n b )Pl - /Up(ni) - Ai 2 <n 1 > + A 2 i(n 2 ), (52) 
= k 2 (n b )P 2 - fi p (n 2 ) + Aia(ni) - A 21 (n 2 ). (53) 

Similarly, by evaluating the second order derivative with respect to x at point x — 1 
one obtains 

= ki [v b Pl + 2(n b ) (m)] - 2/ipAi - A12V1 + A 2 iv 2 , (54) 
= A; 2 KP 2 + 2(n b ) (n 2 )] - 2/i p A 2 + A^i - A 2 iw 2 , (55) 
where we defined 

Vi = ^2n{n-l)Pi{n) } 2 = 1,2 (56) 

n=0 

00 

u 6 = 53n(n-l)A(n). (57) 

n=0 

Hence, the average number of proteins in the steady-state and the variance can be 
derived by solving equations Eqs. ( 154153]) (result is given by the expressions Eqs. ( I2"4"f2"5j) 
in the main text.) 
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8.3. Derivation of survival probability for small RNA based regulation 

Solution of the equation Eq. ( |32|) using method of characteristics is given by 

k s (x - 1) fc s 7 



G\(x, t) = exp 



(5f 



7 + /4 (l + UsY 

where /3 and r are dimensionless parameters as defined in the main text and the function 
g(z) needs to be determined from the initial condition Eq. (133 p . Its argument is given 
by 



7 



{x-l) + 

By matching the initial condition one gets 



g(z) = exp 



1 + 



exp 



n s z 



(59) 



(60) 



1 + 

Finally, since we are interested in the quantity S(t) = Gi(l,t) (survival probability), we 
obtain 

7 __ T 



z -> 



1 + 

S{t) = exp 



-Pt + 



9W- 



(7 + PsY 

from which the equation Eq. (I34I) from the main text can be obtained. 



(61) 
(62) 
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